Search CORE

37 research outputs found

Evaluating Overfit and Underfit in Models of Network Community Structure

Author: Clauset Aaron
Ghasemian Amir
Hosseinmardi Homa
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

A common data mining task on networks is community detection, which seeks an unsupervised decomposition of a network into structural groups based on statistical regularities in the network's connectivity. Although many methods exist, the No Free Lunch theorem for community detection implies that each makes some kind of tradeoff, and no algorithm can be optimal on all inputs. Thus, different algorithms will over or underfit on different inputs, finding more, fewer, or just different communities than is optimal, and evaluation methods that use a metadata partition as a ground truth will produce misleading conclusions about general accuracy. Here, we present a broad evaluation of over and underfitting in community detection, comparing the behavior of 16 state-of-the-art community detection algorithms on a novel and structurally diverse corpus of 406 real-world networks. We find that (i) algorithms vary widely both in the number of communities they find and in their corresponding composition, given the same input, (ii) algorithms can be clustered into distinct high-level groups based on similarities of their outputs on real-world networks, and (iii) these differences induce wide variation in accuracy on link prediction and link description tasks. We introduce a new diagnostic for evaluating overfitting and underfitting in practice, and use it to roughly divide community detection methods into general and specialized learning algorithms. Across methods and inputs, Bayesian techniques based on the stochastic block model and a minimum description length approach to regularization represent the best general learning approach, but can be outperformed under specific circumstances. These results introduce both a theoretically principled approach to evaluate over and underfitting in models of network community structure and a realistic benchmark by which new methods may be evaluated and compared.Comment: 22 pages, 13 figures, 3 table

arXiv.org e-Print Archive

Crossref

Altmetrics Study On Research Outputs In Fields of Social Sciences In Top Iranian Universities

Author: Asnafi Amir Reza
Erfanmanesh Amin
Ghasemian Amir
Publication venue: DigitalCommons@University of Nebraska - Lincoln
Publication date: 24/04/2023
Field of study

Purpose: The purpose of the present work was Altmetrics study of research outputs in the field of social and behavioral sciences in major Iranian universities during 2010-2020. Methodology: The research outputs of the thematic domains of social and behavior of sciences major Iranian universities indexed in the Scopus database were reviewed. This applied research was conducted with a Altmetrics approach. Scopus and Altmetrics Explorer databases were used to collect data. Data analysis was performed using descriptive and inferential statistical tests in Excel software. Findings: Current study revealed Shahid Beheshti, Tehran, Tarbiat Modares, Tabriz, and Shiraz universities, in the field of social sciences, had the most ranks in items of Mentions and Bookmarks. In addition, in all the universities surveyed, the most mentions were on Twitter and the most bookmarks were on Mendeley. Conclusion: Overall, the findings showed that most of the surveyed universities were not in an acceptable position in terms of social media presence and Altmetrics score, indicating the lack of familiarity of the corresponding researchers with the benefits of social media and their low participation in sharing their research outputs on social media

DigitalCommons@University of Nebraska

Detectability thresholds and optimal algorithms for community structure in dynamic networks

Author: Clauset Aaron
Ghasemian Amir
Moore Cristopher
Peel Leto
Zhang Pan
Publication venue: 'American Physical Society (APS)'
Publication date: 19/06/2015
Field of study

We study the fundamental limits on learning latent community structure in dynamic networks. Specifically, we study dynamic stochastic block models where nodes change their community membership over time, but where edges are generated independently at each time step. In this setting (which is a special case of several existing models), we are able to derive the detectability threshold exactly, as a function of the rate of change and the strength of the communities. Below this threshold, we claim that no algorithm can identify the communities better than chance. We then give two algorithms that are optimal in the sense that they succeed all the way down to this limit. The first uses belief propagation (BP), which gives asymptotically optimal accuracy, and the second is a fast spectral clustering algorithm, based on linearizing the BP equations. We verify our analytic and algorithmic results via numerical simulation, and close with a brief discussion of extensions and open questions.Comment: 9 pages, 3 figure

arXiv.org e-Print Archive

Maastricht University Research Portal

CU Scholar Institutional Repository

Directory of Open Access Journals

DIAL UCLouvain

Recommended from our members

Limits of Model Selection, Link Prediction, and Community Detection

Author: Ghasemian Amir
Publication venue: University of Colorado Boulder
Publication date: 01/01/2019
Field of study

Relational data has become increasingly ubiquitous nowadays. Networks are very rich tools in graph theory, which represent real world interactions through a simple abstract graph, including nodes and edges. Network analysis and modeling has gained extremely wide attentions from the researchers in various disciplines, such as computer science, social science, biology, economics, electrical engineering, and physics. Network analysis is the study of the network topology to answer a variety of application-based questions regarding the original real world problem. For example in social network analysis the questions are related to how people interact with each other in online social networks, or in collaboration networks, how diseases propagate or how information flows through a network, or how to control a disease or food outbreak. In electric networks like power grids or in internet networks, the questions can be related to vulnerability assessment of the networks to be prepared for power outage or internet blackout. In biological network analysis, the questions are related to how different diseases are related to each other, which can be useful in discovering new symptoms of diseases and producing and developing new medicines. It appears clearly that the reason of the importance of this interdisciplinary area of science, is due to its widespread applications which involves scientists and researchers with a variety of background and interests. Although networks are much simpler compared to the original complex systems, the interactions among the nodes in the real-world network may seem random, and capturing patterns on these entities is not trivial. There are tremendous questions about inference on networks, which makes this topic very attractive for researchers in the field. In this dissertation we answer some of the questions regarding this topic in two lines of study: one focused on experimental analyses and one focused on theoretical limitations. In Chapter 2 we look at community detection, a common graph mining task in network inference, which seeks an unsupervised decomposition of a network into groups based on statistical regularities in network connectivity. Although many such algorithms exist, community detection’s No Free Lunch theorem implies that no algorithm can be optimal across all inputs. However, little is known in practice about how different algorithms over or underfit to real networks, or how to reliably assess such behavior across algorithms. We present a broad investigation of over and underfitting across 16 state-of-the-art community detection algorithms applied to a novel benchmark corpus of 572 structurally diverse real-world networks. We find that (i) algorithms vary widely in the number and composition of communities they find, given the same input; (ii) algorithms can be clustered into distinct high-level groups based on similarities of their outputs on real-world networks; (iii) algorithmic differences induce wide variation in accuracy on link-based learning tasks; and, (iv) no algorithm is always the best at such tasks across all inputs. Finally, we quantify each algorithm’s overall tendency to over or underfit to network data using a theoretically principled diagnostic, and discuss the implications for future advances in community detection. In Chapter 3 we investigate link prediction problem, another important inference task in complex networks with a wide variety of applications. As we observed in Chapter 2, the community detection algorithmic differences induce wide variation in accuracy on link prediction tasks. On the other hand, many link prediction techniques exist in literature and still there is lack of methodology to analyze and compare these techniques. In Chapter 3, we provide a methodological overview of link prediction techniques and present new results on optimal link prediction and on transfer learning for link prediction. In the former, we investiga

CU Scholar Institutional Repository

new caerin-like antibacterial peptide from the venom gland of the Iranian scorpion Mesobuthus eupeus: cDNA amplification and sequence analysis

Author: Baradaran Masoumeh
Ghasemian Sepideh
Jalali Amir
Jolodar Abbas
Publication venue: 'African Journals Online (AJOL)'
Publication date: 15/01/2016
Field of study

Scorpion venom consists of different types of peptides and proteins which are encoded by individual genes. A full length cDNA consisting of 238 base pair nucleotides and encoding 74 amino acids peptide was isolated from the venom gland of the Iranian scorpion Mesobuthus eupeus (Buthidae family). This peptide named M. eupeus caerin-like antimicrobial peptide (Me-CLAP) belonging to the group of antibacterial peptide was previously described from scorpion. In this study, sequence of cDNA encoding Me-CLAP from the M. eupeus venom glands was amplified using reverse transcriptase polymerase chain reaction (RT-PCR) and was analyzed afterwards. Me-CLAP has similar molecular characteristics to antimicrobial peptides (AMPs) of same genus like Mesobuthus martensii and M. eupeus and more differences were seen with other genus.Keywords: Caerin-like antimicrobial peptide, Mesobuthus eupeus, semi-nested real-time polymerase chain reaction

AJOL - African Journals Online

Evaluating the scale, growth, and origins of right-wing echo chambers on YouTube

Author: Clauset Aaron
Ghasemian Amir
Hosseinmardi Homa
Mobius Markus
Rothschild David M.
Watts Duncan J.
Publication venue
Publication date: 25/11/2020
Field of study

Although it is understudied relative to other social media platforms, YouTube is arguably the largest and most engaging online media consumption platform in the world. Recently, YouTube's outsize influence has sparked concerns that its recommendation algorithm systematically directs users to radical right-wing content. Here we investigate these concerns with large scale longitudinal data of individuals' browsing behavior spanning January 2016 through December 2019. Consistent with previous work, we find that political news content accounts for a relatively small fraction (11%) of consumption on YouTube, and is dominated by mainstream and largely centrist sources. However, we also find evidence for a small but growing "echo chamber" of far-right content consumption. Users in this community show higher engagement and greater "stickiness" than users who consume any other category of content. Moreover, YouTube accounts for an increasing fraction of these users' overall online news consumption. Finally, while the size, intensity, and growth of this echo chamber present real concerns, we find no evidence that they are caused by YouTube recommendations. Rather, consumption of radical content on YouTube appears to reflect broader patterns of news consumption across the web. Our results emphasize the importance of measuring consumption directly rather than inferring it from recommendations.Comment: 29 pages, 21 figures, 15 table

arXiv.org e-Print Archive

Causally estimating the effect of YouTube's recommender system using counterfactual bots

Author: Ghasemian Amir
Hosseinmardi Homa
Ribeiro Manoel Horta
Rivera-Lanas Miguel
Watts Duncan J.
West Robert
Publication venue
Publication date: 20/08/2023
Field of study

In recent years, critics of online platforms have raised concerns about the ability of recommendation algorithms to amplify problematic content, with potentially radicalizing consequences. However, attempts to evaluate the effect of recommenders have suffered from a lack of appropriate counterfactuals -- what a user would have viewed in the absence of algorithmic recommendations -- and hence cannot disentangle the effects of the algorithm from a user's intentions. Here we propose a method that we call "counterfactual bots" to causally estimate the role of algorithmic recommendations on the consumption of highly partisan content. By comparing bots that replicate real users' consumption patterns with "counterfactual" bots that follow rule-based trajectories, we show that, on average, relying exclusively on the recommender results in less partisan consumption, where the effect is most pronounced for heavy partisan consumers. Following a similar method, we also show that if partisan consumers switch to moderate content, YouTube's sidebar recommender "forgets" their partisan preference within roughly 30 videos regardless of their prior history, while homepage recommendations shift more gradually towards moderate content. Overall, our findings indicate that, at least on YouTube, individual consumption patterns mostly reflect individual preferences, where algorithmic recommendations play, if anything, a moderating role

arXiv.org e-Print Archive

Environmental Impact Assessment of the Industrial Estate Development Plan with the Geographical Information System and Matrix Methods

Author: Amin Mohammad Mehdi
Ghasemian Mohammad
Ghoddousi Hamid
Momeni Seyyed Alireza
Poursafa Parinaz
Rezaei Amir Hossein
Ziarati Mohammad
Publication venue: Hindawi Publishing Corporation
Publication date: 01/01/2012
Field of study

Background. The purpose of this study is environmental impact assessment of the industrial estate development planning. Methods. This cross-sectional study was conducted in 2010 in Isfahan province, Iran. GIS and matrix methods were applied. Data analysis was done to identify the current situation of the region, zoning vulnerable areas, and scoping the region. Quantitative evaluation was done by using matrix of Wooten and Rau. Results. The net score for impact of industrial units operation on air quality of the project area was (−3). According to the transition of industrial estate pollutants, residential places located in the radius of 2500 meters of the city were expected to be affected more. The net score for impact of construction of industrial units on plant species of the project area was (−2). Environmental protected areas were not affected by the air and soil pollutants because of their distance from industrial estate. Conclusion. Positive effects of project activities outweigh the drawbacks and the sum scores allocated to the project activities on environmental factor was (+37). Totally it does not have detrimental effects on the environment and residential neighborhood. EIA should be considered as an anticipatory, participatory environmental management tool before determining a plan application

CiteSeerX

Crossref

Directory of Open Access Journals

PubMed Central